{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-msvsr模型简介\n",
    "视频超分源于图像超分，其目的是从一个或多个低分辨率（LR）图像中恢复高分辨率（HR）图像。它们的区别也很明显，由于视频是由多个帧组成的，所以视频超分通常利用帧间的信息来进行修复。PP-MSVSR是一种多阶段视频超分深度架构，具有局部融合模块、辅助损失和细化对齐模块，以逐步细化增强结果。具体来说，在第一阶段设计了局部融合模块，在特征传播之前进行局部特征融合, 以加强特征传播中跨帧特征的融合。在第二阶段中引入了一个辅助损失，使传播模块获得的特征保留了更多与HR空间相关的信息。在第三阶段中引入了一个细化的对齐模块，以充分利用前一阶段传播模块的特征信息。实验证实，PP-MSVSR在Vid4数据集性能优异，仅使用 1.45M 参数PSNR指标即可达到28.13dB！\n",
    "\n",
    "PP-MSVSR模型由飞桨官方出品，是PaddleGAN自研的视频超分模型。\n",
    "更多关于PaddleGAN可以点击https://github.com/PaddlePaddle/PaddleGAN 进行了解。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 模型效果及应用场景\n",
    "### 2.1 视频超分任务：\n",
    "\n",
    "#### 2.1.1 数据集：\n",
    "\n",
    "数据集以常用的视频超分验证数据集Vid4为例。\n",
    "\n",
    "#### 2.1.2 模型效果速览：\n",
    "\n",
    "PP-MSVSR在图片上的超分效果为：\n",
    "\n",
    "<div align='center'>\n",
    "<img src='https://user-images.githubusercontent.com/48054808/144848981-00c6ad21-0702-4381-9544-becb227ed9f0.gif' width = \"80%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 模型如何使用\n",
    "\n",
    "### 3.1 模型推理：\n",
    "* 下载 \n",
    "\n",
    "（不在Jupyter Notebook上运行时需要将\"！\"或者\"%\"去掉。）\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~/work\n",
    "# 克隆PaddleGAN（从gitee上更快）,本项目已经做持久化处理，不用克隆了。\n",
    "!git clone https://github.com/PaddlePaddle/PaddleGAN.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 安装"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 运行脚本需在PaddleGAN目录下\n",
    "%cd ~/work/PaddleGAN/\n",
    "\n",
    "# 安装所需依赖项【已经做持久化处理，无需再安装】\n",
    "!pip install -r requirements.txt\n",
    "\n",
    "# 运行脚本需在PaddleGAN目录下\n",
    "%cd ~/work/PaddleGAN/\n",
    "\n",
    "# 开始安装PaddleGAN \n",
    "!python setup.py develop  #如果安装过程中长时间卡住，可中断后继续重新执行，"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 快速体验\n",
    "\n",
    "恭喜！ 您已经成功安装了PaddleGAN，接下来快速体验视频超分效果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 在GPU上预测一段视频\n",
    "# 低分辨率视频下载地址：https://user-images.githubusercontent.com/79366697/200290225-7fdd364c-2fbe-48b6-a3bf-87349aedec98.mp4\n",
    "%cd ~/work/PaddleGAN/applications/\n",
    "!python tools/video-enhance.py --input demo/Peking_input360p_clip6_5s.mp4 \\\n",
    "                               --process_order PPMSVSR \\\n",
    "                               --output output_dir"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行完成后会在output_dir文件夹下生成一个超分后的视频。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 模型训练：\n",
    "* 克隆PaddleGAN仓库（详见3.1）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 准备数据集\n",
    "\n",
    "这里介绍4个视频超分辨率常用数据集，REDS，Vimeo90K，Vid4，UDM10。其中REDS和vimeo90k数据集包括训练集和测试集，Vid4和UDM10为测试数据集。将需要的数据集下载解压后放到``PaddleGAN/data``文件夹下 。\n",
    "\n",
    "  REDS（[数据下载](https://seungjunnah.github.io/Datasets/reds.html)）数据集是NTIRE19比赛最新提出的高质量（720p）视频数据集，其由240个训练片段、30个验证片段和30个测试片段组成（每个片段有100个连续帧）。由于测试数据集不可用，这里在训练集选择了四个具有代表性的片段（分别为'000', '011', '015', '020'，它们具有不同的场景和动作）作为测试集，用REDS4表示。剩下的训练和验证片段被重新分组为训练数据集（总共266个片段）。\n",
    "\n",
    "  处理后的数据集 REDS 的组成形式如下:\n",
    "  ```\n",
    "    PaddleGAN\n",
    "      ├── data\n",
    "          ├── REDS\n",
    "                ├── train_sharp\n",
    "                |    └──X4\n",
    "                ├── train_sharp_bicubic\n",
    "                |    └──X4\n",
    "                ├── REDS4_test_sharp\n",
    "                |    └──X4\n",
    "                └── REDS4_test_sharp_bicubic\n",
    "                     └──X4\n",
    "              ...\n",
    "  ```\n",
    "\n",
    "  Vimeo90K（[数据下载](http://toflow.csail.mit.edu/)）数据集是Tianfan Xue等人构建的一个用于视频超分、视频降噪、视频去伪影、视频插帧的数据集。Vimeo90K是大规模、高质量的视频数据集，包含从vimeo.com下载的 89,800 个视频剪辑，涵盖了大量场景和动作。\n",
    "\n",
    "  处理后的数据集 Vimeo90K 的组成形式如下:\n",
    "  ```\n",
    "    PaddleGAN\n",
    "      ├── data\n",
    "          ├── Vimeo90K\n",
    "                ├── vimeo_septuplet\n",
    "                |    |──sequences\n",
    "                |    └──sep_trainlist.txt\n",
    "                ├── vimeo_septuplet_BD_matlabLRx4\n",
    "                |    └──sequences\n",
    "                └── vimeo_super_resolution_test\n",
    "                     |──low_resolution\n",
    "                     |──target\n",
    "                     └──sep_testlist.txt\n",
    "              ...\n",
    "  ```\n",
    "\n",
    "  Vid4（[数据下载](https://paddlegan.bj.bcebos.com/datasets/Vid4.zip)）数据集是常用的视频超分验证数据集，包含4个视频段。\n",
    "\n",
    "  处理后的数据集 Vid4 的组成形式如下:\n",
    "  ```\n",
    "    PaddleGAN\n",
    "      ├── data\n",
    "          ├── Vid4\n",
    "                ├── BDx4\n",
    "                └── GT\n",
    "              ...\n",
    "  ```\n",
    "\n",
    "  UDM10（[数据下载](https://paddlegan.bj.bcebos.com/datasets/udm10_paddle.tar)）数据集是常用的视频超分验证数据集，包含10个视频段。\n",
    "\n",
    "  处理后的数据集 UDM10 的组成形式如下:\n",
    "  ```\n",
    "    PaddleGAN\n",
    "      ├── data\n",
    "          ├── udm10\n",
    "                ├── BDx4\n",
    "                └── GT\n",
    "              ...\n",
    "  ```\n",
    "以REDS数据集为例，通过以下命令确认数据集已经准备完成。\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 查看解压目录\n",
    "#%cd ~/work/PaddleGAN/\n",
    "#!tree -d data/REDS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 修改yaml配置文件\n",
    "\n",
    "\n",
    "\n",
    "修改配置文件``` configs/msvsr_reds.yaml```\n",
    "\n",
    "```\n",
    "total_iters: 150000  # 总的训练迭代次数\n",
    "output_dir: output_dir  # 模型参数保存路径\n",
    "find_unused_parameters: True\n",
    "checkpoints_dir: checkpoints\n",
    "use_dataset: True\n",
    "# tensor range for function tensor2img\n",
    "min_max:\n",
    "  (0., 1.)\n",
    "\n",
    "model:\n",
    "  name: MultiStageVSRModel\n",
    "  fix_iter: 2500\n",
    "  generator:\n",
    "    name: MSVSR\n",
    "    mid_channels: 32\n",
    "    num_init_blocks: 2\n",
    "    num_blocks: 3\n",
    "    num_reconstruction_blocks: 2\n",
    "    only_last: True\n",
    "    use_tiny_spynet: True\n",
    "    deform_groups: 4\n",
    "    stage1_groups: 8\n",
    "    auxiliary_loss: True\n",
    "    use_refine_align: True\n",
    "    aux_reconstruction_blocks: 1\n",
    "    use_local_connnect: True\n",
    "  pixel_criterion:\n",
    "    name: CharbonnierLoss\n",
    "    reduction: mean\n",
    "\n",
    "dataset:\n",
    "  train:\n",
    "    name: RepeatDataset\n",
    "    times: 1000\n",
    "    num_workers: 6\n",
    "    batch_size: 2  #建议使用单机8卡训练，每个卡batch_size为2\n",
    "    dataset:\n",
    "      name: VSRREDSMultipleGTDataset\n",
    "      lq_folder: data/REDS/train_sharp_bicubic/X4  # 训练数据低分辨率图像路径\n",
    "      gt_folder: data/REDS/train_sharp/X4  # 训练数据高分辨率图像（GT）路径\n",
    "      ann_file: data/REDS/meta_info_REDS_GT.txt  # 数据集信息文档\n",
    "      num_frames: 20  # 训练时模型的输入视频帧数\n",
    "      preprocess:\n",
    "        - name: GetNeighboringFramesIdx\n",
    "          interval_list: [1]\n",
    "        - name: ReadImageSequence\n",
    "          key: lq\n",
    "        - name: ReadImageSequence\n",
    "          key: gt\n",
    "        - name: Transforms\n",
    "          input_keys: [lq, gt]\n",
    "          pipeline:\n",
    "            - name: SRPairedRandomCrop\n",
    "              gt_patch_size: 256\n",
    "              scale: 4\n",
    "              keys: [image, image]\n",
    "            - name: PairedRandomHorizontalFlip\n",
    "              keys: [image, image]\n",
    "            - name: PairedRandomVerticalFlip\n",
    "              keys: [image, image]\n",
    "            - name: PairedRandomTransposeHW\n",
    "              keys: [image, image]\n",
    "            - name: TransposeSequence\n",
    "              keys: [image, image]\n",
    "            - name: NormalizeSequence\n",
    "              mean: [0., 0., 0.]\n",
    "              std: [255., 255., 255.]\n",
    "              keys: [image, image]\n",
    "\n",
    "  test:\n",
    "    name: VSRREDSMultipleGTDataset\n",
    "    lq_folder: data/REDS/REDS4_test_sharp_bicubic/X4  # 测试/验证数据低分辨率图像路径\n",
    "    gt_folder: data/REDS/REDS4_test_sharp/X4  # 测试/验证数据高分辨率图像（GT）路径\n",
    "    ann_file: data/REDS/meta_info_REDS_GT.txt  # 数据集信息文档\n",
    "    num_frames: 100  # 测试/验证时模型的输入视频帧数\n",
    "    test_mode: True\n",
    "    preprocess:\n",
    "        - name: GetNeighboringFramesIdx\n",
    "          interval_list: [1]\n",
    "        - name: ReadImageSequence\n",
    "          key: lq\n",
    "        - name: ReadImageSequence\n",
    "          key: gt\n",
    "        - name: Transforms\n",
    "          input_keys: [lq, gt]\n",
    "          pipeline:\n",
    "            - name: TransposeSequence\n",
    "              keys: [image, image]\n",
    "            - name: NormalizeSequence\n",
    "              mean: [0., 0., 0.]\n",
    "              std: [255., 255., 255.]\n",
    "              keys: [image, image]\n",
    "\n",
    "lr_scheduler:\n",
    "  name: CosineAnnealingRestartLR\n",
    "  learning_rate: !!float 2e-4  # 学习率\n",
    "  periods: [150000]\n",
    "  restart_weights: [1]\n",
    "  eta_min: !!float 1e-7\n",
    "\n",
    "optimizer:\n",
    "  name: Adam\n",
    "  # add parameters of net_name to optim\n",
    "  # name should in self.nets\n",
    "  net_names:\n",
    "    - generator\n",
    "  beta1: 0.9\n",
    "  beta2: 0.99\n",
    "\n",
    "validate:\n",
    "  interval: 5000  # 多少个迭代次进行验证\n",
    "  save_img: false  # 验证时是否保存图像\n",
    "\n",
    "  metrics:\n",
    "    psnr: # metric name, can be arbitrary\n",
    "      name: PSNR  # 验证指标 PSNR\n",
    "      crop_border: 0\n",
    "      test_y_channel: false\n",
    "    ssim:\n",
    "      name: SSIM  # 验证指标 SSIM\n",
    "      crop_border: 0\n",
    "      test_y_channel: false\n",
    "\n",
    "log_config:\n",
    "  interval: 10  # 每多少个迭代次显示\n",
    "  visiual_interval: 5000  # 多少个迭代次可视化\n",
    "\n",
    "snapshot_config:\n",
    "  interval: 5000  # 多少个迭代次保存一次\n",
    "\n",
    "export_model:\n",
    "  - {name: 'generator', inputs_num: 1}\n",
    "\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~/work/PaddleGAN/\n",
    "%export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8\n",
    "#开始训练\n",
    "!ppython -m paddle.distributed.launch tools/main.py --config-file configs/msvsr_reds.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 模型评估\n",
    "\n",
    "模型评估也是使用配置文件```configs/msvsr_reds.yaml```以REDS数据集为例，需要下载好REDS数据集的验证数据集并且解压到 PaddleGAN/data/REDS/下，在配置文件中更改验证数据集路径后，使用如下命令进行评估。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%cd ~/work/PaddleGAN/\n",
    "%env CUDA_VISIBLE_DEVICES=0\n",
    "#训练完以后，进行评估\n",
    "#模型参数下载地址：https://paddlegan.bj.bcebos.com/models/PP-MSVSR_reds_x4.pdparams\n",
    "!python tools/main.py --config-file configs/msvsr_reds.yaml --evaluate-only --load ${PATH_OF_WEIGHT}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 模型原理\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 与单幅图像超分（SISR）任务不同，视频超分（VSR）任务的关键是充分利用跨帧之间的互补信息来重建高分辨率图像序列。由于来自不同帧的图像具有不同的运动和场景，准确的对齐多个帧和有效的融合不同帧一直是 VSR 任务的研究重点。为了利用相邻帧的丰富互补信息，PP-MSVSR设计了一种多阶段 VSR 深度架构，包括局部融合模块、辅助损失和再对齐模块来逐步增强超分效果。\n",
    "* PP-MSVSR结合了滑动窗口方法和循环网络方法的思想，使用多阶段策略进行视频超分，设计了局部融合模块、辅助损失和细化对齐模块以逐步细化增强结果。\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/79366697/200302378-cd0a5f1b-5d89-44a5-aae3-8f4c587a3575.png\" width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "* 受滑动窗口方法思想的启发，PP-MSVSR在第一阶段设计了一个局部融合模块LFM。该模块在特征传播之前先进行局部特征融合，以加强特征传播中的跨帧特征融合。 具体来说，LFM的目的是让当前帧的特征先融合其相邻帧的信息，然后将融合后的特传给下一阶段的传播模块。\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/79366697/200302612-6ac31db1-ac32-4b14-af3e-479f66746ac9.png\" width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "* 受到循环网络的启发，PP-MSVSR在第二阶段使用与 basicvsr++ 相同的双向循环结构来融合和传播特征，并设计了一个辅助损失，使特征更接近真实高分辨率特征空间。 具体来说，对第二阶段传播后的特征上采样后添加辅助损失。\n",
    "* 与图像超分不同，视频超分通常需要将相邻帧与当前帧对齐以更好地整合相邻帧的信息。 在一些大型运动视频超分任务中，对齐的作用尤为明显。在使用双向循环网络的过程中，往往会有多次相同的对齐操作。为了充分利用之前对齐操作的结果，PP-MSVSR提出了细化对齐模块RAM，可以利用之前对齐的参数并获得更好的对齐结果。\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/79366697/200302619-87b10406-580b-4a65-a6bd-bbb1525005b1.png\" width = \"80%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 相关论文以及引用信息\n",
    "```\n",
    "@article{jiang2021PP-MSVSR,\n",
    "  author = {Jiang, Lielin and Wang, Na and Dang, Qingqing and Liu, Rui and Lai, Baohua},\n",
    "  title = {PP-MSVSR: Multi-Stage Video Super-Resolution},\n",
    "  booktitle = {arXiv preprint arXiv:2112.02828},\n",
    "  year = {2021}\n",
    "  }\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
